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• Abstract 

In this paper, wc consider Multi-User MIMO (MU-MIMO) scheduling in the 3GPP LTE- Advanced 
(3GPP LTE-A) cellular uplink. The 3GPP LTE-A cellular network is one of the two true fourth gen- 
O ■ eration (4G) cellular networks as per the international telecommunications union and is expected to be 

the most widely deployed 4G cellular network. The 3GPP LTE-A uplink allows for precoded multi- 
, stream (precoded MIMO) transmission from each scheduled user and also allows flexible multi-user 

O : ^^''\ "'^'^''^^ wherein multiple users can be assigned the same time-frequency resource. However, 

, exploiting these features is made challenging by certain practical constraints that have been imposed 



OO 

o 



in order to maintain a low signaling overhead. We show that while the scheduling problem in the 
3GPP LTE-A cellular uplink is NP-hard, it can be formulated as the maximization of a submodular set 
function subject to one matroid and multiple knapsack constraints. We then propose constant-factor 
■ polynomial-time approximation algorithms and demonstrate their superior performance via simulations. 

I An interesting corollary that follows from our result is that a popular transmit antenna selection problem 

in point-to-point MIMO communications can be posed as a sub-modular maximization problem that is 
NP-hard but can be approximately solved (with at-least half optimality) by a simple greedy algorithm. 

Keywords: Knapsack, Multi-user scheduling, Matroid, NP-hard, Resource allocation, Submodular 
maximization. 
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1 Introduction 



The 3GPP LTE-A based cellular network [1] together with the IEEE 802.16m based cellular network are 
the only two cellular networks classified as fourth generation cellular networks by the international telecom- 
munications union. Some key attributes that a 4G uplink must possess are the ability to support a peak 
spectral efficiency of 15 bps/Hz and a cell average spectral efficiency of 2 bps/Hz, ultra-low latency and 
bandwidths of up to lOOMHz. To achieve these ambitious specifications, the 3GPP LTE-A uplink is based 
on a modified form of the orthogonal frequency-division multiplexing based multiple-access (OFDMA) [1]. 
In addition, it allows precoded multi-stream (precoded MIMO) transmission from each scheduled user as 
well as flexible multi-user scheduling. Notice that while OFDMA itself allows for significant spectral effi- 
ciency gains via channel dependent frequency domain scheduling, multi-user multi-stream communication 
promises substantially higher degrees of freedom [2,3]. 

Our focus in this paper is on the 3GPP LTE-A uplink (UL) and in particular on MU MIMO scheduling 
for the LTE-A UL. Predominantly almost all of the 4G cellular systems that will be deployed will be based 
on the 3GPP LTE-A standard [1]. This standard is an enhancement of the basic LTE standard which is 
referred to in the industry as Release 8 [4] and indeed deployments conforming to Release 8 are already 
underway. The scheduling in the LTE-A UL is done in the frequency domain where in each scheduling 
interval the scheduler assigns one or more resource blocks (RBs) to each scheduled user. Each RB contains 
a pre-defined set of consecutive subcarriers and is the minimum allocation unit. The UL in the LTE-A 
network employs a modified form of OFDMA, referred to as the DFT-Spread-OFDMA . Here, each user 
employs a DFT precoder to spread its data symbols before placing them on its assigned RBs. In Fig. 1, we 
depict a feasible allocation in LTE-A UL MU scheduling. Notice that each user can be assigned up-to two 
mutually non-contiguous chunks, where each chunk is a set of contiguous RBs. This constraint together 
with the DFT spreading done by each user ensures that the peak to average power ratio of each user is kept 
in check. Notice also that there can be partial overlaps (in terms of assigned RBs) among co-scheduled 
users. Moreover, since the LTE-A base-station is expected to deploy advanced receivers, it is reasonable 
to assume that there is no explicit limit on the number of users that can be co-scheduled on an RB. The 
LTE-A UL also allows for precoded MIMO transmission from each scheduled user in order to achieve even 
higher data rates. While enabling multi-stream transmission can boost the user-rate, precoding confers 
the ability to steer the transmitted streams along suitable directions (in a signal space). In single- user 
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(SU) MIMO scheduling the suitable directions are the dominant eigen-directions of the scheduled user's 
channel, whereas in MU-MIMO scheduling the suitable directions also depend on the channels of the other 
overlapping co-scheduled users. 

Some more practical constraints have been imposed on UL scheduling in 3GPP LTE-A. These include 
ones that seek to minimize the signaling overhead such as allowing each scheduled user to transmit with 
only one power level (or power spectral density (PSD)) on all its assigned RBs, ^ as well as enforcing that 
a scheduled user can be assigned no more than one precoding matrix in a scheduling interval. In addition, 
constraints that aim to mitigate intercell interference are also imposed along with those that arise due to 
the limited capacity of the downlink control channel on which the scheduling decisions are conveyed to the 
users. 

The goal of this work is to design practical uplink MU-MIMO resource allocation algorithms for the 
LTE-A cellular network, where the term resource refers to RBs as well as precoding matrices. In particular, 
we consider the design of resource allocation algorithms via weighted sum rate utility maximization that 
account for finite user queues (buffers) and finite precoding codebooks. In addition, the designed algorithms 
comply with all the aforementioned practical constraints on the assignment of RBs and precoders to the 
scheduled users. Our main contributions are as follows: 

1. We first assume that users can employ ideal Gaussian codes and that the base-station (BS) can 
employ an optimal receiver. We then enforce user rates to lie in a fundamental achievable rate region 
of the multiple access channel which is a polymatroid and show that the resulting resource allocation 
problem is NP-hard. We prove that the resource allocation problem can however be formulated as the 
maximization of a monotonic sub-modular set function subject to one matroid and multiple knapsack 
constraints, and can be solved using a recently discovered polynomial time randomized constant-factor 
approximation algorithm [5,6]. We also adapt a simpler deterministic greedy algorithm and show 
that it yields a constant-factor approximation for several scenarios of interest. 

2. We then consider practical scenarios where users employ codes constructed over finite alphabets. In 
this case the mutual information terms needed to specify an achievable rate region do not have closed 
form expressions. On the other hand the achievable rate region obtained for Gaussian alphabets can 
be a loose outer bound. Consequently, we obtain a tighter outer bound which is also a polymatroid. 

^This PSD is implicitly determined by the number of RBs assigned to that user, i.e., the user divides its total power equally 
among all its assigned RBs subject possibly to a spectral mask constraint. 
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As a result all algorithms developed for Gaussian alphabets can be reused after simple modifications. 

3. We provide simulation results using a realistic channel model and employ a data-dependent upper 
bound to benchmark the performance of our algorithms. We show that our LTE-A scheduling 
algorithm has a good average performance within 65 — 75% of the upper bound. 

1.1 Related Work 

Resource allocation for the OFDMA networks has received significant attention [7-11] with most of it 
directed towards the downlink. A large fraction of the resource allocation problems hitherto considered 
are single-user (SU) scheduling problems, which attempt to maximize a system utility under the constraint 
that scheduled users can only be assigned non-overlapping subcarriers. These problems have been formu- 
lated as continuous optimization problems, and since they are in general non-linear and non-convex, many 
approaches including those based on game theory [12] , dual decomposition [7] and the analysis of optimal- 
ity conditions [13] have been developed. MU-MIMO in the downlink has been considered in [14, 15] where 
capacity scaling under imperfect channel estimation and/or quantized channel state information feedback 
is investigated but the design of approximation algorithms for resource allocation is not considered. Recent 
works have focused on emerging cellular standards and have formulated the resource allocation problems 
as constrained integer programs. Prominent examples are [10], [16] which consider the design of downlink 
SU-MIMO schedulers for LTE and LTE-A systems, respectively, and derive constant factor approximation 
algorithms. On the other hand, resource allocation for the DFT-Spread-OFDMA uplink has garnered 
relatively much less attention with [17-19] being the recent examples. In particular, [17, 18] show that the 
single-user UL LTE (Release 8) scheduling problem is NP-hard and provide constant-factor approxima- 
tion algorithms, whereas [19] considers SU-MIMO LTE-A scheduling. The algorithms in [17-19] cannot 
incorporate MU scheduling and also cannot incorporate knapsack constraints. MU scheduling for the LTE 
(Release 8) UL is considered in detail in [20]. However, we emphasize that certain additional constraints 
imposed on LTE (Release 8) MU scheduling essentially ensure that algorithms optimized for LTE UL 
scheduling are unsuitable for LTE-A scheduling whereas algorithms optimized for LTE-A UL scheduling 
(as presented in this paper) are not even applicable to LTE UL scheduling since they yield infeasible solu- 
tions. To the best of our knowledge the design of approximation algorithms for MU-MIMO scheduling in 
LTE-A uplink has not been considered before. 
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2 MU-MIMO Scheduling in the LTE-A UL 



Consider a single-cell with K users and one base-station (BS) which is assumed to have > 1 receive 

(n) 

antennas. Suppose that user k has A'^t > 1 transmit antennas and its power budget is P^. Let denote 
the Nr X Nt channel matrix seen by the BS from user k on RB n. We let N denote the total number of 
RBs. For convenience and without loss of generality, in the following analysis we assume each RB to have 
unit size. 

We consider the problem of scheduling users in the frequency domain in a given scheduling interval. 
Let ak, 1 < k < K denote the non- negative weight of the k^^ user which is an input to the scheduling 
algorithm and is updated using the output of the scheduling algorithm in every scheduling interval, say 
according to the proportional fairness rule [21]. Letting r^. denote the rate assigned to the k^^ user (in bits 
per N RBs), we consider the following weighted sum rate utility maximization problem. 



l<k<K 

where the maximization is over the assignment of RBs, precoders and powers to the users subject to: 

• Decodability constraint: The rates assigned to the scheduled users should be decodable by the 
base-station receiver. Notice that unlike SU-MIMO, MU-MIMO scheduling allows for multiple users 
to be assigned the same RB. As a result the rate that can be achieved for user k need not be only 
a function of the RBs, precoders and powers assigned to the k^^ user but can also depend on those 
assigned to the other users as well. 

• One precoder and one power level per user: Each scheduled user can be assigned any one 
precoding matrix from a finite codebook of such matrices W. In addition, each scheduled user can 
transmit with only one power level (or power spectral density (PSD)) on all its assigned RBs. This 
PSD is implicitly determined by the number of RBs assigned to that user, i.e., the user divides its 
total power equally among all its assigned RBs. 

• At most two chunks per-user: The set of RBs assigned to each scheduled user should form 
at-most two mutually non-contiguous chunks, where each chunk is a set of contiguous RBs. This 
constraint is a compromise between the need to provide enough scheduling flexibility and the need 
to keep the per-user peak-to-average-power ratio (PAPR) under check. A feasible RB allocation and 



max 




(1) 
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co-scheduling of users in LTE-A multi-user uplink is depicted in Fig 1. 

• Finite bufTers We let Qk denote the size in bits of the queue (buffer) associated with the k^^ user. 
Thus, the rate assigned to user k cannot exceed Qk- 

• Control channel overhead and interference limit constraints: Every user that is scheduled 
on at least one RB must be informed about its transmission rate and the set of RBs on which it 
must transmit along with the precoder it should employ. This information is sent on the DL control 
channel of limited capacity which in turn imposes a limit on the set of users that can be scheduled. On 
the other hand, the scheduling decisions that are made must respect limits imposed to mitigate the 
interference caused to other cells. In [20] it is shown that the control channel overhead constraints can 
be modeled as binary column-sparse knapsack constraints, whereas the interference limit constraints 
can be modeled as generic knapsack constraints. 

We will formulate the optimization problem in (1) as the maximization of a monotonia suhmodular set 
function subject to one matroid and multiple knapsack constraints. 

Towards this end, let e = (n, c, W) denote an element, where 1 < u < K denotes a user, W G W 
denotes a precoder from a finite codebook W and c € C denotes a valid assignment of RBs chosen from 
the set C containing all possible valid assignments. In particular, each c is a vector with binary-valued 
({0, 1}) elements and we say an RB i belongs to c (i G c) if c contains a one in its i*^ position, i.e., 
c{i) = 1. Note that the non-zero entries in each c G C form at-most two non-contiguous chunks. Next, we 
let £_ = {e = (ti,c, W) : 1 < u < A', c G C, W G W} denote the ground set of all possible such elements. 
For any such element we adopt the convention that 

e = (u, c, W) Ce = c; Wg = W; Ue = u; 

ote = otu, Qe = Qu, ) = hI") V n. (2) 

In addition, we let pe denote the power level (PSD) associated with the element e = (n, c, W). This PSD 
can be computed as ^.^^^^ , where size(c) denotes the number of ones (number of RBs) in c. Let ae,Qe 
denote the weight and buffer (queue) size associated with the element e, respectively and let denote the 
rate associated with the element e. We will use the phrase selecting an element e to imply that the user Ue 
is scheduled to transmit on the RBs indicated in Cg with PSD p^ and precoder Wg. Thus, the constraints 
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of one precoder and one power level per user along with at most two chunks per-user can be imposed by 
allowing the scheduler to select any subset of elements U_ £_ such that J2e&u = u} < 1 for each 
u £ {!,• ■ ■ ,K}, where 1{.} denotes the indicator function. Accordingly, we define a family of subsets of 
denoted by X, as 



U<^£:Y^ l{ue =u} <1, V l<u< K 



(3) 



We next consider the decodability constraint after first assuming that each user can employ ideal 
Gaussian codes (i.e., codes for which the coded modulated symbols can be regarded as i.i.d. Gaussian) 
and that the BS can employ an optimal receiver. Subsequently, we will consider finite input alphabets. 
Recall that in DFT-Spread-OFDMA each user linearly transforms its codeword using a DFT matrix in 
order to reduce the PAPR. Note, however, that under the assumption of ideal Gaussian codes the DFT 
spreading operation performed by each user on its codeword has no effect. This is because i.i.d. Gaussian 
distribution is invariant with respect to any unitary linear transformation. Accordingly, we define a set 
function / : 2— — )• IR+ as 



N 



fU) = Y.^og 



n=l 



I + j;PeCe(n)H(")We(H(")We 



(4) 



for all U_ ^ S^. It can be verified that /(.) defined in (4) is a submodular set function (see for instance 
Proposition 2 in [22]), i.e., 

/(^U {e}) - f{A) > f{BU{e}) - f{B), 

for all ^ C ^ C ^ and e££_. Further since it is monotonic (i.e., f{A) < f{B), V ^ C ^) and normalized 
/(0) = 0, where (j) denotes the empty set, we can assert that /(.) is a rank function. Consequently, for 
each y_Q the region 



m/) = < 



(5) 



is a polymatroid [3]. Note that for each U_ ^ £_, Vj^-, f) is the fundamental achievable rate region of a 
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multiple access channel so that each rate-tuple ri£ = [r^e&j £ 'PiiL-, f) is achievable [3,23]. In particular, 
each rate-tuple tk = [re]e^u G "PiU, f) is achievable [3] in the sense that for any rate assignment arbitrarily 
close to ru (i.e., r : r < ru) there exist coding and decoding schemes that can meet any acceptable level 
of error probability. Thus, we can impose decodability constraints by imposing that the assigned rate-tuple 
satisfy tk g f) for any selected subset U'^£_. 

Next, in order to impose buffer (queue) constraints, we define 

BU) = {r= [re]eGU G ffi^' : < re < Qe, V e G ^}, C ^. (6) 

Thus, for a (tentative) choice U_, we can satisfy both decodability and buffer constraints by assigning only 
rate-tuples that lie in the region f) nK(Z^). Clearly among all such rate-tuples we are interested in 
the one that maximizes the weighted sum rate. Hence, without loss of optimality with respect to (1), 
with each U_(l S_we can associate a rate-tuple in "Pi^, f) H K(i{) that maximizes the weighted sum rate. 
Consequently, we define the following set function that determines the reward obtained upon selecting any 
subset of £. We define the set function /i : 2- — )• as 



h{U) = max <^ V Oer^ > , V C ^. (7) 



re£(W,/)nS(W) 



Leveraging the arguments made in [20], we can represent the control channel overhead constraints as 
L packing (knapsack) constraints such that a subset U_ is feasible if and only if 

Acxw < 1l, (8) 

where Ac G {0, Ij^^l^l is a binary valued matrix and 1^ is a L length vector of ones. Moreover, the total 
number of non-zero entries in any column of Ac is no more than an integer A > 1 which denotes the 
column sparsity level. On the other hand, the interference limit constraints can be represented as 

A/xz^ < 1m, (9) 

where A/ G [0, l]^^^l-l and 1a/ is a M length vector of ones. 



8 



Summarizing the aforementioned results, we have formulated (1) as the following optimization problem: 



max{/i(^)| s.t. 



A/xw < 1a/; P^c^u < li- 



(10) 



In (10) we regard M, A as constants that are arbitrarily fixed, whereas L can scale polynomially in 
the cardinality of the ground set \£_\. Then, for a given number of users K, number of RBs N and 
the codebook cardinality |W| (which together fix |^|), an instance (or input) of the problem in (10) 
consists of a set of non-negative user weights and queue sizes {Qu}, per-user per-RB channel matrices 
{ni"^} : 1 < u < K,l < n < N , a codebook W (of cardinality |W|) along with a column sparse matrix 
Ac G {0, Ij^^l^l and any matrix A/ G [0, Ij^-'^^l^l. The output is a subset £_ along with a rate-tuple 
r^. Note that \£\ is 0(A"|W|A^^). 

We first introduce the following two results that will be invoked later. 

Lemma 1. The family of subsets I defined in (3) is an independence family and is a partition 

matroid. 

Proof. First we note that X is downward closed, i.e., if -4, G X then any B'^ A satisfies B € X. Next, let 
^(^j) denote the set oi all e G £_ : Ue = k and notice that ^^^^^ n ^(^^ = (j), \/ k ^ j. Then, note that X can 
also be defined as ^ G X <^ < 1 V 1 < A; < A'. Further, it can be verified X satisfies the exchange 

property, i.e., for any A^B_gX_ such that |^| > \B\ we have that ^ eG A\B_ such that B_U {e} G X. Thus, 
we can conclude that (^, X) is a partition matroid. □ 

The proof of the following lemma follows from basic definitions [24] and is skipped for brevity. 



Lemma 2. The region ^{H, f) H ^(Z^), \/ L(_ £_ is a polymatroid characterized by the rank function 
f -.2^^ m+ where 



We are now ready to offer our main result. Let us assume that computing h{U) for any U_'^ S_ incurs 
a unit cost (or equivalently is given by an oracle in a single query). We will show that even under this 




(11) 



9 



assumption the problem in (10) is NP hard. 

Theorem 1. The optimization problem in (10) is NP hard and is the maximization of a monotonic sub- 
modular set function subject to one matroid and multiple knapsack constraints. 

Proof. We will first show that (10) is the maximization of a monotonic sub-modular set function subject 
to one matroid and multiple knapsack constraints. Invoking Lemma 1, it suffices to show that the function 
h{.) is a monotonic submodular set function. From the definition of h{.) in (7) it is readily seen that it 
is monotonic, i.e., h{U_') < h(L[), \/ U_ Q Let o(., .) denote an ordering function such that for any 
subset U o{U_, k) is the element having the k*^ largest weight among the elements in U_. Hence we 

have that ao(w,i) — '^o(w,2) — '^o{u,\u\)- Further, let us adopt the convention that for any subset U Q 
o{U_, fc) = (/), V /c > 1^1 + 1 &: = 0. We can now invoke Lemma 2 together with the important property 
that the rate-tuple in any polymatroid that maximizes the weighted sum is determined by the corner point 
of that polymatroid in which the elements are arranged in the non-increasing order of their weights [3, 24j. 
Thus, we can express h{.) as 

U\ 

HW = ^(ao{w,fc) - ao(u,k+i))f'{{o{ll, 1), • • • , o{U, k)}), yUC£. (12) 

k=l 

A key step is to express (12) as 

\£\ 

hU) = Y.(ao(e,k) - ao(e,k+i)) f'{{o{£, 1), • • • , o(^, k)}nU), V C ^. (13) 

Note that in (13) the number of terms in the summation as well as the non-negative combining weights 
{i^o{£,k) ~ (^o{£,k+i))} iiot depend on U_. It can then be verified that since /'(.) is monotonic and 
submodular, each set function /^(.) is also a monotonic and submodular set function. From (13) it can now 
be inferred that since h{.) is a weighted sum of monotonic submodular functions in which all the combining 
weights are non-negative, it is a monotonic submodular set function. Thus, (10) is the maximization of a 
monotonic submodular set function subject to one matroid and multiple knapsack constraints. 

We will now show that (10) is an NP hard problem. We will consider instances of the problem where 
the number of RBs = 1, all users have identical weights, unit powers, infinite queues and one transmit 
antenna each and where the codebook W is degenerate, i.e., W = {1}. Thus, we have \£_\ = K. In addition. 
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we assume that the number of receive antennas is equal to the number of users K so that a given input 
of user channels forms a K x K matrix, denoted here by H = [hi, • • • ,h/<]. Further, we will assume only 
one knapsack constraint which in particular is a cardinality constraint on the number of users that can 
be scheduled on the one available RB. We will show that the problem specialized to these instances is 
also NP-hard so that the original problem is NP-hard. Note that the matroid constraint now becomes 
redundant and (10) simplifies to maximizing the sum rate under a cardinality constraint 

max log|I + HDHt|, (14) 

D=diag{dx,-" Ak} 
dj.e{0,l} V fe & EfcLi dk<C 

where C : 1 < C < K is the input maximum cardinality. Now using the determinant equality 

log |I + HDH+I = log |I + DHtHDl (15) 
together with the monotonicity of the objective function, we can re-write (14) as 

max log|I + DH^HD|. (16) 

E>=diag{dx,- - ,dx} 
<2fce{0,l} Vfc & T.^^^d^ = C 

Note that (16) is equivalent to determining the C x C principal sub-matrix of the positive definite matrix 
I + H^H having the maximum determinant. Note that for a given K, an instance of the problem in (16) 
is the matrix H together with C. We will prove that (16) is NP-hard via contradiction. Suppose now that 
an efficient algorithm (with a complexity polynomial in K) exists that can optimally solve (16) for any 
input K X K matrix H and any C : 1 < C < K. This in turn would imply that there exists an efficient 
algorithm (with a complexity polynomial in K) that for any input C . 1 < C < K and any K x K positive 
definite matrix S, can determine the C x C principal sub-matrix of S having the maximum determinant. 
Invoking the reduction developed in [25], this would then contradict the NP hardness of the problem of 
determining whether a given input graph has a clique of a given input size. □ 

Theorem 2. There is a randomized algorithm whose complexity scales polynomially in \£\ and which yields 
^ e^(M+A+i)+o(M) approximation to (10). 

Proof. The key observation is that the partition matroid constraint in (10) can be expressed as K linear 
packing constraints (one for each user). Let Ap denote the resulting K x \£\ packing matrix whose k^^ 
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row corresponds to the k user. Note that this row has ones in each position for which the corresponding 
element e satisfies Ue = k and zeros elsewhere. Together these K packing constraints are sparse packing 
constraints wherein in each column a non-zero entry appears only once. Thus, the total K + L + M packing 
constraints are sparse constraints in which each element can appear in at-most M + A + 1 constraints so 
that each column can have at-most M + A + 1 non-zero entries. With this understanding, we can invoke 
the randomized algorithm from [6] which is applicable to the maximization of any monotonic submodular 
function subject to sparse packing constraints and obtain the guarantee claimed in the theorem. □ 

Notice that since any monotonic submodular set function is also monotonic and sub-additive, we can 
infer the following result from Theorem 1. 

Lemma 3. The function h{.) defined in (7) is sub-additive, i.e., 

HW < h{U^) + h{U2), yili,U2:ll-lli'^1A2=U.- (17) 

Practical implementation might demand a simpler and combinatorial (deterministic) algorithm. Un- 
fortunately, as remarked in [5], it is difficult to design combinatorial (deterministic) algorithms that can 
combine both matroid and knapsack constraints. Nevertheless in Algorithm I we specialize a well known 
greedy algorithm to our problem of interest (10). Before analyzing the performance of Algorithm I we 
consider the following scenarios that involve simpler modeling of the constraints and are of particular 
interest. We first note that necessary and sufficient conditions for a knapsack constraint (with rational 
valued coefficients) to be a matroid constraint have been derived in [26]. A simple sufficient condition for 
a knapsack constraint to be matroid constraint is the following. 

Lemma 4. The i^^ knapsack constraint is a matroid constraint if all its strictly positive coefficients are 
identical,i.e., l{^i,j > 0} = l{Ai k > 0} Aij = Ai^^, ^ j,k. 

Then consider the scenarios that are covered by the following assumptions. 

Assumption 1. The control channel overhead constraints are modeled using L knapsack constraints but 
where L now represents the number of orthogonal (non- overlapping) control channel regions. Each user 
(and hence all its corresponding elements) is associated with only one of these regions. Further, each 
constraint corresponds to a cardinality constraint which enforces that no more than a given number of 
elements among those associated with the corresponding control region can be scheduled. Notice that these 
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L control channel overhead constraints are sparse with A = 1 and since they satisfy Lemma 4 they are 
matroid constraints as well. 

Assumption 2. For each adjacent victim BS, the elements of S_ are divided into two sets using an ap- 
propriate threshold: one set comprising those which cause high interference and the other one comprising 
those which do not. Then a cardinality constraint is imposed only on the set of elements that cause high 
interference. Thus, all resulting interference limit constraints (upon considering M victim BSs) satisfy 
Lemma 4 and hence are matroid constraints. 

The following result provides the worst-case guarantee offered by Algorithm I. 

Theorem 3. The complexity of Algorithm I is 0(i^^A^|W|) and it yields a approximation to (10). 
Further, if Assumptions 1 and 2 are satisfied then Algorithm I yields a constant-factor 2^]^ approximation 
to (10). 

Proof. We first consider the complexity of Algorithm I and note that since the partition matroid constraint 
needs to be satisfied, there can be at-most K steps in repeat-until loop of the algorithm. Also, recall that 
the the size of the ground set £_ is 0{KN'^\W\). Then, at each step we need to compute h{S_VJ e) for each 
e^£_\S_ such that 5 Ue satisfies all the constraints. Thus, the worst-case complexity is 0{K'^N^\yV\). 

Let us now consider the approximation guarantees. Notice that due to the partition matroid constraint 
any optimal solution to (10) cannot contain more that K elements. Then, using the subadditivity of h{.) 
shown in Lemma 3 together with the facts that Algorithm I is monotonic and in its first step selects the 
element of ^ having the highest weighted rate, suffice to prove the guarantee. On the other hand, suppose 
that Assumptions 1 and 2 are satisfied (over all instances) . Consider the L control channel constraints and 
let denote the set of elements involved in the control channel constraint so that £_ = U^^^^. Recall 
that £^ n = (j), i ^ i' and notice that any set S. that satisfies these L constraints can be expressed 
asU_ = Llf^^Uj, where Hi : \lAj\ < Ci, I < £ < L, where is the cardinality bound imposed by the 
£th control channel constraint. Thus the L control channel constraints together are indeed one partition 
matroid. More importantly, the intersection of this partition matroid with the one defined in Lemma 1 is 
also one matroid. This can be verified by observing that all maximal members in this intersection have 
the same cardinality of min{i^, X^^^^ Ci}. Finally, combining this matroid with the other M (interference 
limit) matroid constraints, we see that the feasible subsets belong to the intersection of M-l-1 matroids and 
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hence form a p— system where p = M + 1. Then invoking the guarantee offered by the greedy algorithm 



Remark 1. Let us reconsider the suhmodular maximization problem defined in (14)- This problem in fact 
also represents a popular transmit antenna selection problem in point-to-point MIMO communications [29]. 
Indeed, K can be regarded as the total number of available transmit antennas while C then denotes the 
number of transmit antennas that have to be selected and a normalization factor ^J~^, where p denotes the 
SNR, can be absorbed into the matrix H. Then, our result in Theorem 1 proves that this transmit antenna 
selection problem is NP-hard. Next, the greedy Algorithm I when specialized to this problem reduces to a 
known incremental successive transmit antenna selection algorithm [29] but for which no approximation 
guarantees were hitherto known. Notice that this problem satisfies Assumptions 1 and 2 since the constraint 
in (14) can be accommodated using just one control channel knapsack constraint that has equal coefficients 
for all users. Then, invoking the result in Theorem 3 (with M = 0) we can infer that the greedy Algorithm I 
(or equivalently the incremental successive transmit antenna selection algorithm) offers a 1/2 approximation 
to the transmit antenna selection problem. 

Recah that hitherto we have assumed that computing h{Lf) for any £. incurs a unit cost. We can 
indeed show that Algorithm I has polynomial complexity under a stricter notion that computing /(^) 
(instead of h(L£j) for any £_ incurs a unit cost.^ To show this, it suffices to prove that h(U) can be 
determined with a complexity polynomial in \Ll\ . A key observation towards this end is that for any U_'^ 
f'Qd.) ill (11) can be computed as 



Then, since the function f{']Z) — ^eenQf^' V ^ C ^ is a submodular set function, we can solve the 
minimization in (18) using submodular function minimization routines that have a complexity polynomial 
in \U_\ [30,31]. Thus, from (12) we can conclude that h{U_) can indeed be determined with a complexity 
polynomial in \L[\. 

We now propose simple observations that can considerably speed up the greedy algorithm 

• Lazy evaluations. An important feature that speeds up the greedy algorithm substantially has been 
^This assumption results in no loss of generality since the worst-case cost of computing /(W) is 0{NK^). 



on a p— system [27,28], proves the second part. 



□ 




(18) 
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discovered and exploited in [32,33]. In particular, due to the submodularity of the objective function 
the incremental gain offered by an element over any selected subset of elements not including it 
decreases monotonically as the selected subset grows larger. Thus, at any step in the algorithm, 
given a set of selected elements S and an element e(z£_\S for which h{S_L)e) has been evaluated, we 
do not have to evaluate h{S_Ue') for another element e' G £.\S_, if we can assert that /i(5Ue) — h{S_) > 
h{S_' U e') — h{S_') where 5' C 5 denotes the set of selected elements at a previous step. This results 
in no loss of optimality with respect to the original greedy algorithm. 

• Exploiting subadditivity. Suppose that at any step of the greedy algorithm we have a set of selected 
elements S. Further, let = (u, W, ci) and 62 = (m,W,C2) be two elements in ^ \ 5 such that 
ci and C2 comprise of only one chunk each and are mutually non-intersecting. Then, letting e' = 
{u, W, ci + C2), we see that 



where the first inequality stems from the fact that /i(5Ue') is monotonically increasing in the transmit 
PSD of e' and the second inequality stems from the monotonicity and subadditivity of h{.). Thus, 
we have that 



Then if 5Ue]^,5Ue2 as well as SUe' satisfy all the constraints, we can evaluate h{S_U ei) , h{SL) 62) 
and skip evaluating U e'). By adopting this procedure over all elements in we can ensure 

that the element selected will offer at-least 1/2 the gain yielded by the locally optimal element. Then, 
using a well known result on the greedy algorithm with an approximately optimal selection at each 
step [27, 28] we can conclude that this variation of our greedy algorithm will yield an approximation 
guarantee of 1/2+M+i '^hen Assumptions 1 and 2 are satisfied. 

3 Practical Modulation and Coding Schemes 

In the LTE-A uplink a scheduled user can be assigned one out of three modulations (4, 16 & 64 QAM) and 
an outer Turbo-code whose coding rate is one out of several available choices. Since the available outer 



h{S U e') < h{S U ei U 62) < h{S U e^) + h{S U 62) 



(19) 



h{SU^) < 2max{/i(5 Uei),/i(5 Ue2)}. 



(20) 
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codes are powerful and since the BS can employ near-optimal receivers (such as Turbo SIC) a reasonable 
choice for the achievable rate region is the following. Let Se denote the constellation (with unit average 
energy and cardinahty Se) associated with element e££_. For any subset A'^ £_ and any n : 1 < n < N, 
let I^'^^ (A) denote the mutual information evaluated for a point-to-point MIMO channel whose output can 
be modeled as 



(21) 



where v^") ~ CAA(0, 1) is the additive Gaussian noise and xi"^ E 5^* is the input vector corresponding to 

(n) (n) 

element e whose entries are independently and uniformly drawn from Se and where Xe , x^, are mutually 
independent for any e ^ e'. Then, for any L[C £_ an achievable rate region is given by 



r = [re\em e K?' : 5^ re < ^I^'^HA), y AQU. 



N 



(22) 



n=l 



Notice that in deriving (22) we have assumed an ideal BS receiver as well as no DFT spreading by each user, 
both of which allow for higher achievable rates. ^ Unfortunately, no closed form expressions are available 
for I^^\A) 'iiid the rate region in (22) does not have a useful structure. Clearly the region defined before 
in (5) assuming Gaussian inputs is an outer bound which however can be loose. Here we obtain a tighter 
outer bound that also has a useful structure. We first offer the following result. 

Proposition 1. For any subset A'!= ^ ^''^'d o.ny n : 1 < n < N, we have that 



l(")(^) < min <^ log 
~ -RCA ' 



1+ PeCe(n)Hf)We(Hf)We)t 

eeA\n 



+ Y,Ntlog{Se) 



(23) 



Further the set function g : 2- ^ IR_|_ defined as g{A) = S^=i 9^^HA), ^ A'^ is a rank function. 

Proof. Consider any ^C^, n: 1 < n < N and the model in (21). Using the chain rule for mutual 
information along with the fact that the inputs corresponding to any two distinct elements of A are 



''Neglecting the per-user DFT spreading expands the rate region since the noise at the BS is assumed to be Gaussian and 
independent across RBs. 
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mutually independent, we can upper bound as 

sen 

for any TZ C A. Since the cardinality of the input corresponding to element e is S"^* we have that 
X(")(e) < Ntlog{Se)- Then using the fact that for any given input covariance, Gaussian inputs (with the 
same covariance) maximize the mutual information (over the Gaussian noise channel model in (21)), we 
have that 



X(")(^\^) < log 



1+ ^ PeCe(n)H(")We(Hf)We)t 

eGA\R 



Since these arguments are valid for any subset ^ C we can deduce that (23) is true. The remaining 
result follows from basic definitions. □ 

In this context, we note that the bound in (23) is a non-trivial generalization of a bound on the 
finite alphabet mutual information over a point-to-point fading channel employed in [34] to derive a tight 
lower bound on the outage probability. However, that bound when applied to our case would only yield 
X(")(e) < min{log |I + PeCe(n)Hf'%e(Hj'%e)t|, iV^ log(5e)} for any ee£. 

Next, we outer bound the region in (22) as 



9)={r= [re]eeu £ K^' : < giA), ^ A'^U} ■ (24) 



Invoking Proposition 1 we use the fact that g{.) is a rank function from which it follows that the region 
Jlil£: 9) is a polymatroid. Then invoking Lemma 2 we can infer the following result. 

Proposition 2. For any choice of selected elements tj_^£_, the rate region T^{U_,g') = Jl{y.,g) H S(Z^) is 

a polymatroid which is characterized by the rank function 



5'(^) = nnn <^5(^\^) + J^Qe!>, V^C^Y. (25) 

ee7^ 
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Then, upon by defining 



h (U) = max 

rST:(W,9') 

we consider the optimization problem 

max{/i'(^)| s.t. 

Ajxw<lAf; Ac^u<1l. (26) 

As before, it can be shown that the optimization problem in (26) is the maximization of a monotonic 
submodular function subject to one matroid and multiple knapsack constraints. Algorithm I and its 
associated results are thus applicable. 



4 Simulation Results 

In this section we present our simulation results. We simulate an uplink with 10 users, wherein the BS 
is equipped with four receive antennas. The system has 280 sub-carriers divided into 20 RBs (of size 14 
sub-carriers each) available as data subcarriers that are used for serving the users. We assume 10 active 
users all of whom have identical maximum transmit powers. We use the SCM urban macro channel model 
(with co-polarized antennas having lOA, lA separation at the BS and the mobile (user), respectively, and 
15° BS mean angular spread) to generate the channel between each user and the base-station. In all the 
results given below we assume an infinitely backlogged traffic model. 

In Fig. 2, we assume that each user is equipped with two transmit antennas and can use an antenna 
selection codebook, i.e., W = {[1; 0], [0; 1]}. The BS employs the optimal receiver and each user can employ 
an unconstrained (Gaussian) input alphabet. For simplicity, we assume no interference limit constraints 
and consider one control channel overhead constraint which imposes that no more than seven users can be 
scheduled. We plot the average cell spectral efficiency curves obtained when Algorithm I is employed by 
the BS scheduler with and without the control channel overhead constraint (denoted respectively by Algo- 
l-limit- AS and Algo-I-AS). Also plotted are the corresponding spectral efficiency curves obtained when 
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each user has only one transmit antenna (denoted respectively by Algo-I-limit and Algo-I). For each curve, 
we plot a corresponding upper bound by specializing a data-dependent upper bound from [32] which is 
applicable to any sub-modular function maximization (see also [33]). From the figure we observe that with 
and without antenna selection, the performance of Algorithm I is within 68 — 75% of the data-dependent 
upper bound, which is superior to the worst case guarantee 1/2 (obtained by specializing the result in 
Theorem 3). 

5 Conclusions 

We considered resource allocation in the 3GPP LTE-A cellular uplink which allows for MIMO transmission 
from each scheduled user as well as multi-user scheduling wherein multiple users can be assigned the same 
time-frequency resource. We showed that the resulting resource allocation problem is NP-hard and then 
proposed constant-factor polynomial-time approximation algorithms. 
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Figure 1: A Feasible RB Allocation in the LTE-A UL 



Table 1: Algorithm I: Greedy Algorithm for LTE-A UL MU-MIMO 

1: Initialize S_ = (p 
2: Repeat 

3: Determine 



arg max {h{SLIe)} (27) 

■S U e eX; A J- X5 u ^ < 1 j, J- ; Ac X5 u g < 1 i 



and set v = h{S_ U e) — h{S_). 
4: li v>0 Then 
5: S^SUe 
6: End If 

7: Until V <0 oi e = (f) 

8: Output S. 
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Figure 2: Average spectral efficiency versus SNR (dB): LTE-A MU-MIMO Scheduling. 
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